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1 Introduction 

Perhaps one of the most important examples of large-scale, data-intensive, geographically 
distributed, information systems is NASA’s Earth Observing System (EOS) Data and 
Information System (EOSDIS). EOS is a NASA mission aimed at studying the planet 
Earth. A series of satellites with scientific instruments aboard will be collecting important 
data about the Earth’s atmosphere, land, and oceans over a period of 15 years. This 
mission will generate an estimated terabyte/day of raw data which will be processed to 
generate higher level data products [2]. Raw data received from the satellites is first stored 
as Level 0 (L0) data which may then be transformed after successive processing into levels 

2 through 4 (L2 - L4). Data received from the satellites and the data products generated 
from them will be stored at various Distributed Active Archive Centers (DAACs) located 
throughout the United States. An important component of a DAAC is the Data Server — 
the subsystem that stores and distributes data as requested by EOSDIS users. 

The Data Server stores its information using a hierarchical mass storage system that 
uses a combination of automated tape libraries and disk caches to provide cost-effective 
storage for the large volumes of data held by the Data Server. Performance studies and 
workload characterization methods and software for hierarchical mass storage systems 
are reported in [3, 5, 6, 7, 8]. 

In this paper, we present a model for the scalability analysis of the Data Server 
subsystem of the EOSDIS Core System (ECS). The goal of the model is to analyze if the 
planned architecture of the Data Server will support an increase in the workload with the 
possible upgrade and/or addition of processors, storage subsystems, and networks. This 
analysis does not contemplate new architectures that may be needed to support higher 
demands. 

The remaining sections of this paper are organized as follows. Section two provides a 
summary of the architecture of ECS’s Data Server as well as a high level description of 
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the Ingest and Retrieval operations as they relate to ECS’s Data Server. This description 
forms the basis for the development of the scalability model of the data server. Section 
three presents the scalability model and the methodology used to solve it. This section 
describes the structure of the scalability model, input parameters, algorithms for com- 
puting parameters of the scalability model solver, algorithms for solving the scalability 
model, and the assumptions and rationale behind these assumptions. The scalability 
model takes into account the proposed hardware and software architecture. The model 
is quite general and allows the modeling of data servers with numerous configurations. 

2 Ingest and Retrieval Operations 

This section provides a high level description of the Ingest and Retrieval workloads of 
the ECS’s Data Server. This description forms the basis for the development of a model 
to analyze the scalability of the Data Server. The scalability analysis entails determining 
whether the current architecture of the ECS Data Server supports an increase in the 
workload intensity with possibly more processing and data storage elements of possibly 
higher performance. 

2.1 Subsystems of the Data Server 

The following subsystems of the Data Server will be considered for the purpose of the 
scalability analysis considered in this study: 

Software Configuration Items: 

• Science Data Server (SDSRV): responsible for managing and providing access 
to non-document earth science data. 

• Storage Management (STMGT): stores, manages, and retrieves files on behalf 
of other SDPS components. 

Hardware Configuration Items: 

• Access Control and Management (ACMHW): supports the Ingest and Data 
Server subsystems that interact directly with users. Of particular interest here is 
the SDSRV. 

• Working Storage (WKSHW): provides high performance storage for caching 
large volumes of data on a temporary basis. 

• Data Repository (DRPHW): provides high capacity storage for long-term stor- 

/y>a fi 1 nf 
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• Distribution and Ingest Peripherals (DIPHW): supports ingest and distri- 
bution via physical media. 
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2.2 Ingest Data Operation 

The diagram in Figure 1 depicts the flow of control and data for the Ingest process. 
We have not included Document Repository nor the Document Data Server due to their 
small impact on scalability if compared with ingest of LO data. Circles in the diagram 
represent processes. The labels in square brackets beside each process indicate the hard- 
ware configuration item they execute on. Bolded labels indicate hardware configuration 
items that belong to the Data Server. 



It serves as the 
coordinator for 
users 


Figure 1: LO Ingest Control and Data Flow. 


The main aspects of the diagram of figure 1 are discussed below: 

• Incoming LO data is first stored into the system on the Staging Disk, and then into 
AMASS’s cache — the hieratchical mass storage systems’s disk cache for files. The 
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metadata are extracted and entered into a Metadata database managed by Sybase 
and the actual data are archived. This is depicted in figure 2. 



Metadata Database (MDDB) Data Archive 


Figure 2: Data Flow Diagram for Ingest Data. 

• The SDPF (Science Data Processing Function) represents the users of the Ingest 
system and negotiates with the Ingest Request Manager for coordination of trans- 
ferring data into the Ingest system. 

• Data are initially entered through an interactive GUI interface, or, most of the time 
from external data providers through ftp or direct transfer of files, if that is done 
on the same local network, into the Staging Disk. 

• The actual data is then transferred into AMASS’ disk cache. From the cache, the 
data migrates to robotically mounted tapes managed by AMASS. The metadata 
extracted from the data is stored into a Metadata Database managed by Sybase. 

• The SDSRV (Science Data Server) gives the metadata templates to the Ingest 
Request Manager for it to extract metadata. 

• There are two and sometimes three SDSRV’s and one STMGT (Storage Manage- 
ment) processes. The Ingest Request Manager process selects which SDSRV to 
use. 

The scalability analysis will among other things determine possible performance bot- 
tlenecks. The staging disk, the AMASS disk cache, and the metadata extraction process 
are likely candidates for bottlenecks. 
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2.3 Retrieval Operation 

This section examines the retrieval and processing operation on L1+ data. Figure 3 
depicts the flow of control and data for this operation. Circles in the diagram repre- 
sent processes. The labels in square brackets beside each process indicate the hardware 
configuration item they execute on. Bolded labels indicate hardware configuration items 
that belong to the Data Server. 

The retrieval operation proceeds in the following three stages: 

Stage 1: Checking data and deciding what processing is required: 

• SDSRV initiates the retrieval process by notifying the Subscription Server of the 
new data arrival. 

• The Subscription Server performs a subscription checkfor this data and performs 
an appropriate notification, e.g., email notification, etc. 

• The Subscription Server notifies PDPS PLANG of new data arrival. 

• PLANG figures out (e.g., retrieves) a processing plan and based on the processing 
plan, passes the processing request to PRONG. 

• PDPS PRONG connects to the appropriate SDSRV (may not be the SDSRV which 
initiated the retrieval and processing operations). 

Stage 2: Retrieving data: 

• The SDSRV requests that the Data Distribution Services CSCI (DDIST) retrieves 
the data files. 

• SDSRV — Requests DD ist — ^quests STMGT. The STMGT retrieves the files from 
AMASS archive into the AMASS cache if it is already not present in the cache. 

• SDSRV notifies PRONG of data (identified by UR) availability. 

Stage 3: Processing data and archiving, both data and metadata: 

• PRONG transfers the retrieved data from the Working Storage to local PDPS disk. 
(If the AMASS cache and Working Storage are on different devices, then data must 
be first moved from the former to the latter.) 

• PRONG processes the retrieved data to produce a higher level product. 

• PRONG processes the data to a higher-level product and extracts metadata from 
the higher-level data using the Metadata Extraction Tool and populates the target 
metadata template and writes a metadata file (on MDDB Sybase). 
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• PDPS PRONG sends an insert request to SDSRV. 

• SDSRV — > requests gTMGT — Requests AMASS. The AMASS file manager archives 
the files. Archiving is done in two steps: 

— STMGT copies data from PDPS (local disk) to Working Storage via an ftp 
command. 

— data are copied from the Working Storage to AMASS cache (and then to 
AMASS archive). 

• SDSRV inserts metadata in the Metadata Database (MDDB) and then notifies 
PRONG that the archival operation has been completed. 

2.4 Assumptions 

The various software processes shown in the previous subsection were mapped into the 
different hardware configuration items for the GSFC, EDC, and LaRC DAACs. The 
following assumptions were made when developing the scalability model. 

• Processing of “Ingest data” and “Data retrieval and processing” constitute the 
main load on the Data Server. Thus, we modelled only these two operations. 

• We did not model users’ requests for data to be subsetted or subsampled nor we 
modelled compressed data. 

• In data retrieval operations, PLANG retrieves a processing plan from a database 
(e.g., Sybase). 

• The AMASS cache and the working storage may be implemented on the same disk. 

• Servers that are not potential bottlenecks were not considered in the model. Ex- 
amples include the “subscription server” and PDPS. 

• We assume that mean arrival rate of both types of requests (ingest data and data 
retrieval) and service demands of these requests at various service stations are 
available or can be easily estimated. 

3 A Scalability Model 

We now describe our scalability model for the ECS’s Data Server and our methodology for 
solving this model. We describe the structure of the scalability model, input parameters, 
algorithms for computing parameters of the scalability model solver, and algorithms 
for solving the scalability model. We describe our assumptions and rationale for these 
assumptions. 
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The scalability model is based on our understanding of the architecture of ECS’s Data 
Server and the Ingest and Retrieval operations described in the previous section. The sole 
purpose of the model is to analyze the scalability of the Data Server, i.e., to determine 
whether the current architecture of the ECS Data Server can support an increase in the 
workload intensity. 

3.1 A Framework for Scalability Analysis 

Figure 4 gives the structure of the scalability model. The “Scalability Model Generator” 
collects information from three input files (these files define the modeling information 
on the ECS’s data server and the workload) and processes this information to create 
an output file which contains inputs to the “Scalability Model Solver” . This solver uses 
queuing network [4] techniques to obtain desired performance measures such as response 
times per workload, device utilizations, bottleneck indications, and queue lengths. 

The first input file to the Scalability Model Generator, “Hardware Objects” , defines 
the hardware resources (e.g., processors, disks, networks, and tape libraries) of the Data 
Server. The second input file to the Scalability Model Generator, “Workloads and Ex- 
ecution Flow” , completely characterizes the workload that drives the Data Server. The 
third input file to the Scalability Model Generator, “Processes” , defines the parameters 
of the software modules that will be executed on hardware servers by arriving requests 
for service (i.e., the workload). 

The Scalability Model Generator reads information in these three files, processes this 
information, and generates an output file that contains the service demands for every 
resource in the queueing network model of Sendee demand is the total service time of a 
request of a certain workload type at a given device. The service demand does not include 
any time waiting to get access to the device. Waiting times are obtained by solving 
the model. The equations that form the basis of computation of service demands are 
presented in section3.3. The Scalability Model Solver reads information about the service 
demand from this file and solves the queueing network model for desired performance 
measures. The underlying equations that form the basis for a solution are described in 
section 3.4. 

3.2 Parameters for the Scalability Model 

The parameters used in the scalability model are: 

• V: set of processes 

• NCPUs s : number of processors of server s 

• SPint s : SPECint95 rating of server s 

• SPfp s : SPECfp95 rating of server s 

• TypeSP p : type (e.g., int or fp) of the SPEC rating used to specify the computation 
demand for process p. 
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• SP p : SPEC rating of the machine used to measure the computation demand of 
process p. 

• ComputeDemand p : compute demand of process p measured at a machine with 
SPEC rating SP p , in seconds 

• PExec p u) : probability that process p executes in workload w 

• Seek<j iS : average seek time of single disk d of server s, in seconds 

• Latency^ s : average rotational latency of single disk d of server s, in seconds 

• TransferRate^*: transfer rate of single disk d of server s, in MBytes/sec 

• Hit fay. cache hit ratio for disk array d 

• RAID Seeker average seek time at any of the disks that compose disk array da at 
server s, in seconds 

• RAIDLatency^ : average rotational latency at any of the disks that compose disk 
array da at server s, in seconds 

• RAIDRaterf a j : transfer rate of any of the disks that compose disk array da at server 
s, in Mbytes/sec 

• NTDrives t>s : number of tape drives of tape library t at server s. 

• NRobots tiS : number of robots of tape library t at server s. 

• Rewindi itii : rewind time of tape drive i of tape library t at server s. 

• MaxTSearch liM : maximum search time of tape drive i of tape library t, in seconds 
at server s. 

• TapeRate, t s : transfer rate of tape drive i of tape library t at server s, in Mbytes/sec 

• Exchanges is : number of tape exchanges per hour for each robot of tape library t 

at server s. (Each exchange involves putting the old tape in the tape library and 
loading the new tape into the tape drive.) 

• FilesPerMount,, t s : average number of files accessed per mount by process p at tape 
library t at server s. 

• FileSizePerMount PitiS : average size of files accessed by process, in Kbytes p per 
mount at tape library t at server s. 

• Bandwidth,,: bandwith of network n, in Mbps 

• NType n : type of network n. 
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• NumBlockSd )SiP : number of blocks accessed by process p at single disk d at server 
s. 

• BlockSize dsp : block size for each access to single disk d at server s by process p, in 
KBytes 

• NumBlocksReadd a , SiP : number of blocks read by process p from disk array da at 
server s 

• NumBlocksWrittenda^p: number of blocks written by process p to disk array da at 
server s 

• StripeUnitSize <ias : size of the stripe unit for disk array da at server s, in Kbytes 

• Server p : server in which process p is allocated 

• \ w : arrival rate of workload w, in requests/sec 

• V w : set of processes executed by workload w 

• Kw = {(p, x ) | p € V and x = Pr\p is executed in workload w}\ process flow 
within workload w. 

• PNet„ i1u : probability that network n is traversed by workload w. 

• Volume,!^: total data volume transferred through network n by workload w, in 
Kbytes 

The input parameters for the Scalability Model Solver are: 

• D iiPiW : average service demand of process p in workload w at device i, i.e., the total 
time spent by the process at the device for workload w. This time does not include 
any queuing time. 

• \ w : average arrival rate of requests of workload w that arrive to ECS’s Data Server. 

3.3 Algorithms for Computing the Scalability Model Solver Pa- 
rameters 

In this section, we derive expressions for computing service demands for workloads at 
various types of devices. The service demand at a device due to a task is defined as the 
multiplication of the visit count of the task to the device and the service time of the task 
per visit to the device. The service demand represents the total average time spent by 
the task at the device. 
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3.3.1 Computation of Service Demands for Processors 


The service demand that a task in workload w presents at a server s due to the execution 
of a process p is given by: 


ComputeDemandp x PExeCp iU) 
ScaleFactor(p, s) 


( 1 ) 


where 


ScaleFactor(p, s ) 


SPint s /SP p if TypeSPp = int 
SPfpj/SPp if TypeSP p = fp 


( 2 ) 


Since ComputeDemandp is given for a processor of certain rating, ScaleFactor(p, s ) is 
used to normalize the process service time to the speed-rating of the current processor. 
The service demand, D s<w , of a workload w at the CPU of server s is then 

Ds, w = D S p W (3) 

VpST 5 ™ | s=Server p 


3.3.2 Computation of Service Demands for Single Disks 

The service demand that a task in workload w presents to a disk d at a server s due to 
the execution of a process p is given by: 


D, 


d,$,p,w 


PExeCp,„, x NumBlocks^p x 


Seekd, s -f Latency d s + 


BlockSize^^^ 
TransferRate^ s x 1000 


(4) 


The term “Seek d ^ + Latency d s 4- f r an^ferMte^ ' Ti ooo ” denotes the time the disk takes 
to fetch one block of data. 

The service demand, D d ^ StW , of a workload w at disk d of server s is then 

Dd,S,W — ^2 Dd,s,p,w (5) 

Vp67\ u | 5=Server p 


3.3.3 Computation of Service Demands for Disk Arrays 

The computation of service demands for disk arrays is involved and is done in several 
steps. The number of blocks that a process p reads at a disk (i.e., the number of stripe 
accesses) in disk array da at server s is given by 


NumBlocksReadPerDisk da , 5iP 


NumBlocksRead daiSj p x B^ockSize djS)P 
5 x StripeUnitSize da s 


( 6 ) 


The numerator denotes the total volume of information read from all five disks in 
the disk array and the denominator denotes the volume of information read from all five 
disks in a single stripe group access. 
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The service time to process each stripe request at each disk is given by the following 
equation: (The first subexpression indicates that the seek time is amortized over all stripe 
unit accesses.) 


ServiceTimePerDiskdajp 


RAIDSeekcja.s 

NumBlocksReadPerDisk d a )SiP 
RAIDLatency da s T 
StripeUnitSize da s 
RAIDRate das 


(7) 


The service demand due to read requests at a disk in disk array da at server s due 
to execution of process p in workload w is given by the following equation: (Since a disk 
array has a data cache, term (1 — Hit d(1)S ) denotes the probability that data to be read is 
not available in the cache and a read access will have to be made.) 


ReadServiceDemandPerDiskda^p^ = NumBlocksReadPerDiskda^j, x 

ServiceTimePerDiskrfa j p x 

PExec P)ti , x (1 - HitdaJ (8) 


Now the service demand, , due to read requests at disk array da at server s 
due to execution of process p in workload w is given by the following equation: 


D 


r 

da,s,p,w 


H 5 x ReadServiceDemandPerDiskda^p,™ 
1 - USingleDisk^ s p w 


(9) 


where H b = £® =1 l/j = 2.28 and USingleDisk^ s p w is given by Eq. (14). The term H b 
shows up in the expression because a read request at the disk array is complete only after 
the last read at its disks is done. This approximation is based on [5]. 

The service demand, of a workload w at the disk array da of server s is then 


n r 

^ da,s,w 


v P €:p* 


E 

| s=Server p 


D r 

^da,s,p, w 


(10) 


The computation of the service demand due to write requests at disk array da at server 
s due to the execution of process p in workload w is similar. The computation of the 
number of blocks that a process p writes at a disk (i.e., the number of stripes written^ m 
the disk array da at server s is somewhat different and is given by the following equation: 
(The (4/5) term in the denominator is due to the fact that a parity block is generated 
for every four blocks written onto the disks. Thus 25% additional data is generated.) 


NumBlocksWrittenPerDisk da , Si p 


NumBlocksWritten d(Mi p x BlockSize diSiP 
(4/5) x StripeUnitSize da s 


( 11 ) 
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WriteServiceDemandPerDiskd a S) p it(; = NumBlocksWrittenPerDiskda^p x 

ServiceTimePerDisk(ia,i,p x 

PExec Pitu (12) 


D w 

^ da,s,p,w 


H 5 x WriteServiceDemandPerDiskda ^p u; 
1 - USingleDisk^,^ 


(13) 


where 


USingleDisk^ j p^ = PExec P)tu x X w x 

[(NumBlocksReadPerDiskda^p + 
NumBlocksWrittenPerDisk^s^) 

x ServiceTimePerDisk^sp] (14) 

The service demand, s w , of a workload w at the disk array da of server s is then 

Dl,,, w = E Dl,.,.. < 15 ) 

VpC'Pu, | «=Server p 


3.3.4 Computation of Service Demands for Tape Libraries 

The computation of the service demands for tape drives and robots at a tape library is 
involved and is done in several steps. 

The total average seek time that a process p experiences at tape drive i in tape library 
t at server s is given by the following equation: (The factor “1/2” is due to the fact that 
the first file access will result in searching half the tape on the average and the factor 
“1/3” shows up because the remaining file accesses will require searching 1/3 of the tape 
on the average.) 

AverageSeekTime t s = MaxTSearchi )t , s x [1/2+ (FilesPerMount Pi { jS — l)/3] (16) 

The average tape mount time in seconds at tape drive i in tape library t at server s 
is given by 

MountTime. / iS = 3,600/2 x Exchanges. s (17) 

The time that tape drive i in tape library t at server s takes to serve a file access 
request is given by 

TapeDriveServiceTimej t s = AverageSeekTime ( s + 

FilesPerMount P) t, s x FileSizePerMount PitiS 
TapeRate i ( , 

Rewind,, t>s (18) 
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The average robot service time is then 


RobotServiceTime( iS = 2 x MountTime{ j()i 


( 19 ) 


So, the service demand at the tape drive i of tape library t of server s due to the 
execution of process p in workload w is 

DlZfr = PExec p w x TapeDriveServiceTime it s /NTDrives t>s (20) 

The service demand at the tape drive i of tape library t of server s due to workload 


w is 


■pdape 


drive V - ' pdape drive 

Vp£V m | s=Server p 


( 21 ) 


The average service demand at any robot of tape library t of server s due to the 
execution of process p in workload w is 


Dt°s*p,w — PExec PitlU x RobotServiceTime ts /NRobots ti3 


(22) 


The service demand at any robot of tape drive i of tape library t of server s due to 
workload w is 


ryrobot \ ' 

^tyS, w / . 

Vpe V-m | s=Server p 


jyrobot 

^t.s.p.w 


(23) 


3.3.5 Computation of Service Demands for Networks 

The service demand of workload w presents at network n is given by the following equa- 
tion: (The term “Volume n , w / Bandwidth n ” denotes the time taken by the network to 
transfer the data for a task in workload w.) 

rynetwork = PNet», w X Volume^ X 8 

n,w Bandwidth n x 1000 

3.4 The Scalability Model 

The scalability model uses queuing network (QN) models to determine the degree of 
contention at each of the devices that compose ECS’s Data Server. The QN model used 
in this case is a multiclass open QN [4] with additional approximations to handle the 
case of disk arrays and to handle the instances of simultaneous resource possession that 
appear when modeling automated tape libraries [3]. The QNs used also allow for load 
dependent devices. Load dependent devices are used in the model to handle the following 
situations: 


* Symmetric multiprocessors: this case is characterized by a single queue for multiple 
servers. In this case, the service rate p(k) of the CPU as a function of the number 
of requests k. is given by k.p for k < J and J.p for k > J where J is the number of 
CPUs and p, is the service rate of each CPU. 
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• Collision-based LANs: in this case, the throughput of the LAN decreases as the 
load increases due to an increase in the number of collisions. This can be modeled 
by using an appropriate service rate function n(k) as a function of the load on the 
network [1]. 

An open multiclass QN is characterized by the number R of classes, the number K of 
devices, by a matrix D — [Di,r] i = 1, • • • K, r = 1, • • • , R of service demands per device 
per class, and by a vector A = (Ai, • • • , \r) of arrival rates per class. For each device, one 
has to indicate its type. The following types of devices are allowed in the QN model: 

• Delay devices: no queues are formed at these devices. 

• Queuing Load Independent (LI) devices: queues are formed at these devices but 
the service rate of the device does not depend on the number of requests queued 
for the device. 

• Load Dependent device (LD): queues are formed at these devices but the service 
rate of the device depends on the number of requests queued for the device. In 
the case of load dependent devices, one has to provide the service rate multipliers 
(see [4]) for each value of the number of customers. In most cases — this is true for 
multiprocessors and collision-based LANs — the value of the service rate multipliers 
saturates very quickly with the number of requests. Therefore, we only need to 
provide a small and finite number of service rate multipliers for each LD device. 

• Disk Array: this is a special type of device used to model disk arrays (see Fig. 5 
for a depiction of this type of device). 


The output results of an open multiclass QN are: 

• R ir ( A): average residence time of class r requests at device i, i.e., the total time — 
including queuing and service — spent by requests of class r at device i. 

• Rr(X): average response time of requests of class r. Rr(X) = R'i, r (X)- 

• Ui( A): utilization of device i. 

• rii ir ( A): average number of requests of class r present at device i. 

• n-( AA: average number nf renuest-s of at device i. n -( AA = n- r (A). 

• -i \ / — - - — — — ~ ~ ~ . - * v / 'r=i *¥v r 

The basic equations for open multiclass QNs are (see [4]): 


UiA A) 



^A,r( A) 


A rA,r 

R 

E^r(A) 

r=l 

t/i,r(A) 

1 - Ui( A) 
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delay device 


Rr{ A) 


ni( A) 


f A,r 


A,r 


l 1 - C/,(A) 

i=l 

2"i.r(A) 

r=l 


LI device 


The extension to LD devices is given in [4]. 


4 Concluding Remarks 

In this ppaer, we derived the algorithms and expressions to be used to convert data 
describing the software and hardware architecture of ECS’s Data Server into a scalability 
model. The model will be used to verify how well the Data Server supports an increase in 
workload intensity while maintaining reasonable performance. The scalability model is 
based on queuing network models that are automatically generated from the description 
of the architecture and the workload. 
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MDDB: Metadata Database 
SDSRV: Science Data Server 
PLANG: Production Planning CSCI 
PRONG: Processing CSCI 
PDPS: Product Development and 
Processing System 
DDIST: Data Distribution Services 
CSCI 

STMGT: Storage Management 
software CSCI 

AMASS: Archive Management and 
Storage System 

CERES: Clouds and Earth’s Radiant 
Energy System 
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Figure 3: A Flow Diagram of Data Retrieval and Processing 
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Figure 4: Scalability Model Framework. 
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